Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: WikiSourcerer Date: Jun 15, 2013 5:30pm
Forum: texts Subject: OCR questions

I am wondering if there is a way to get the same OCR routine(s) in play during the typical PDF-upload to DjVu-derivative-creation to run if you just uploaded a DjVu file instead? If I'm following the the derive log right; an uploaded PDF has its pages converted to .jp2 files, each .jp2 is processed using OCR to create an XML file that amounts to the "text layer", later on the DjVu variant gets created and then that previous XML is used to embed the detected text into the new DjVu.

I guess I'm asking if the same OCR step(s) are possible but using an uploaded .DjVu file as the "core" file rather than some other typical image (.jpgs etc.) or doc (.pdf etc.) file type. TIA.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJeff Kaplan Date: Jun 15, 2013 10:10pm
Forum: texts Subject: Re: OCR questions

the derived formats from original files is at http://archive.org/help/derivatives.php. the abbyy OCR file is not derived from djvu.